NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Extracting Dual Solutions via Primal Optimizers

https://doi.org/10.4230/LIPICS.ITCS.2025.29

Carmon, Yair; Jambulapati, Arun; O'Carroll, Liam; Sidford, Aaron (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Meka, Raghu (Ed.)
We provide a general method to convert a "primal" black-box algorithm for solving regularized convex-concave minimax optimization problems into an algorithm for solving the associated dual maximin optimization problem. Our method adds recursive regularization over a logarithmic number of rounds where each round consists of an approximate regularized primal optimization followed by the computation of a dual best response. We apply this result to obtain new state-of-the-art runtimes for solving matrix games in specific parameter regimes, obtain improved query complexity for solving the dual of the CVaR distributionally robust optimization (DRO) problem, and recover the optimal query complexity for finding a stationary point of a convex function.
more » « less
Full Text Available
The Price of Adaptivity in Stochastic Convex Optimization

Carmon, Yair; Hinder, Oliver (September 2024, PMLR)

Full Text Available
Accelerated Parameter-Free Stochastic Optimization

Kreisler, Itai; Ivgi, Maor; Hinder, Oliver; Carmon, Yair (September 2024, Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:3257-3324, 2024.)

Full Text Available
A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions

Carmon, Yair; Jambulapati, Arun; Jin, Yujia; Sidford, Aaron (January 2024, Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA))

Full Text Available
A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions

Carmon, Yair; Jambulapati, Arun; Jin, Yujia; Sidford, Aaron (January 2024, SIAM)

Full Text Available
DoG is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule

Ivgi, Maor; Hinder, Oliver; Carmon, Yair (July 2023, Proceedings of the 40th International Conference on Machine Learning)

We propose a tuning-free dynamic SGD step size formula, which we call Distance over Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from the initial point and norms of gradients) and have no “learning rate” parameter. Theoretically, we show that, for stochastic convex optimization, a slight variation of the DoG formula enjoys strong, high-probability parameter-free convergence guarantees and iterate movement bounds. Empirically, we consider a broad range of vision and language transfer learning tasks, and show that DoG’s performance is close to that of SGD with tuned learning rate. We also propose a per-layer variant of DoG that generally outperforms tuned SGD, approaching the performance of tuned Adam. A PyTorch implementation of our algorithms is available at https://github.com/formll/dog.
more » « less
Full Text Available
ReSQueing Parallel and Private Stochastic Convex Optimization

https://doi.org/10.1109/FOCS57990.2023.00124

Carmon, Yair; Jambulapati, Arun; Jin, Yujia; Lee, Yin Tat; Liu, Daogao; Sidford, Aaron; Tian, Kevin (November 2023, IEEE)

Full Text Available
DataComp-LM: In search of the next generation of training sets for language models

Li, Jeffrey; Fang, Alex; Smyrnis, Georgios; Ivgi, Maor; Jordan, Matt; Gadre, Samir; Bansal, Hritik; Guha, Etash; Keh, Sedrick; Arora, Kushal; et al (April 2025, https://doi.org/10.48550/arXiv.2406.11794)

The authors introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments aimed at improving language models. DCLM provides a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants can experiment with dataset curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline, the authors find that model-based filtering is critical for assembling a high-quality training set. Their resulting dataset, DCLM-Baseline, enables training a 7B parameter model from scratch to achieve 64% 5-shot accuracy on MMLU with 2.6T training tokens. This represents a 6.6 percentage point improvement over MAP-Neo (the previous state-of-the-art in open-data LMs), while using 40% less compute. The baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% and 66%), and performs similarly on an average of 53 NLU tasks, while using 6.6x less compute than Llama 3 8B. These findings emphasize the importance of dataset design for training LMs and establish a foundation for further research on data curation.
more » « less
Free, publicly-accessible full text available April 21, 2026
RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

Carmon, Yair; Jambulapati, Arun; Jin, Yujia; Sidford, Aaron (January 2022, International Conference on Machine Learning (ICML))

Full Text Available
Optimal and Adaptive Monteiro-Svaiter Acceleration

Carmon, Yair; Hausler, Danielle; Jambulapati, Arun; Jin, Yujia; Sidford, Aaron (January 2022, Advances in Neural Information Processing Systems 35 (NeurIPS 2022))

Full Text Available

« Prev Next »

Search for: All records